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ABSTRACT 


A  unified  approach  to  the  design  and  evaluation  of  fast  algorithms  for  discrete  signal 
processing  is  developed.  Based  on  the  theory  of  finite  groups,  it  hence  includes  the 
familiar  cases  of  the  fast  Fourier  and  Walsh-Hadamard  transforms.  However,  the  use  of 
noncommutative  groups  reveals  a  large  variety  of  novel  methods.  Some  of  these  exhibit 
a  superior  performance,  as  measured  by  both  reduced  error  rates  and  computational 
complexity,  on  nonstationary  data. 

The  recent  history  of  this  subject  is  reviewed  first,  followed  by  a  detailed  examination 
of  the  three  principal  ingredients  of  the  present  study:  the  underlying  groups,  the  signal¬ 
processing  tasks  on  which  the  group-based  algorithms  are  to  compete,  and  the  signal 
models  used  to  define  the  data  environment.  Test  results  and  conclusions  then  follow, 
the  former  being  based  on  the  use  of  random  correlation  matrices, 
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1.  INTRODUCTION 


This  preliminary  section  is  intended  to  provide  a  necessary  minimal  background  to  motivate  the 
technical  results  to  be  presented  below.  Most  of  this  motivation  is  set  out  in  great  detail  in  an  earlier 
Technical  Report  (1|  (especially  Section  III  therein),  and  it  is  assumed  that  serious  readers  are  familiar 
with  that  material.  However,  the  present  section  will  update  [  1 !  by  restating  the  essential  role  of  finite 
groups  in  discrete  signal  processing  and  by  reviewing  some  pertinent  work  of  other  researchers  which, 
in  most  cases,  was  not  available  when  (1|  was  being  prepared. 

In  subsequent  sections  of  this  report,  we  will  detail  the  results  of  an  effort  to  implement  and 
evaluate  the  program  set  forth  in  Section  III  of  ( 1  ].  namely  the  application  of  the  harmonic  analysis 
associated  w  ith  finite  groups  to  certain  standardized  signal-processing  tasks.  The  essential  purpose  is  to 
determine  if  this  somewhat  exotic  theory  can  lead  to  practical  signal-processing  algorithms  that  improve 
on  conventional  techniques  that  use  the  fast  Fourier  transform.  This  particular  transform  is.  of  course, 
associated  with  a  particular  class  of  finite  groups,  namely  the  cyclic  groups.  We  shall  eventually 
conclude  that  its  use  can  only  be  recommended  for  data  derived  from  a  stationary  signal  source:  more 
general  nonstationary  data  are  better  processed  by  transforms  associated  with  other  groups. 

1.1  A  GENERAL  APPROACH  TO  SIGNAL  PROCESSING 

For  several  years.  I  have  been  evolving  both  a  technical  and  a  philosophical  approach  to  signal 
processing.  Here.  I  emphasize  a  particular  technical  aspect,  as  a  sequel  to  Section  Ill  of  |1|.  For  an 
associated  philosophical  viewpoint,  refer  to  Section  I  of  that  report  where  I  expressed  the  view  that  one 
of  the  primary  motivations  for  working  on  the  foundational  aspects  of  signal  processing  is  to  provide 
answers  to  the  basic  problem  of  justifying  numerical  operations  on  a  data  set;  that  is.  why  do  we  do  one 
thing  and  not  another? 

Many  of  the  possible  answers  to  such  questions  are  prosaic.  A  particular  algorithm  may  be  em¬ 
ployed  because  it  is  familiar,  convenient  (e.g..  prepackaged),  or  fast.  Whether  or  not  it  is  the  best  thing 
to  do  in  a  given  context  is  often  less  clear.  Indeed,  there  is  always  a  trade-off  among  familiarity,  speed 
(e.g.,  complexity),  and  error.  Even  when  an  optimal  procedure  is  known,  its  actual  implementation  may 
be  too  computationally  intensive  for  the  desired  application,  and  recourse  to  a  faster  but  suboptimal 
method  must  be  made. 

We  can  perceive  a  coarse  partition  of  the  broad  spectrum  of  signal-processing  activity,  as  depicted 
in  Figure  I.  The  spectrum  ranges  from  highly  theoretical  studies  (mathematical  foundations,  or  MFSP). 
to  the  design  of  corresponding  reliable  and  efficient  software,  to  the  implementation  of  this  code  in 
suitable  hardware,  to  the  final  specific  field  usage  of  such  devices,  usually  as  part  of  a  larger  system. 
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Mindful  of  the  fundamental  problem  of  justifying  operations  on  data,  work  in  MFSP  is  usually 
aimed  at  creating  optimal  algorithms  for  some  particular  data  and  design  structure.  But.  it  may  also 
involve  creating  new  algorithms  (optimal  or  not)  by  use  of  some  fairly  exotic  mathematics.  As  in 
theoretical  physics,  this  ever-increasing  use  of  ever-deeper  more  diverse  areas  of  mathematics  is  the 
source  of  much  of  the  power  and  the  intellectual  excitement  of  this  work. 

Several  quite  recent  examples  of  the  high  level  of  international  interest  and  activity  in  MFSP 
include  the  following  conferences  and  workshops:  the  Conference  on  Mathematics  in  Signal  Processing 
at  University  of  Bath.  September  1985:  the  SIAM  Workshop  on  Mathematics  of  Systems  and  Signal 
Processing  at  Stanford  University.  September  1987;  and  the  summer  Program  on  Signal  Processing  at 
the  Institute  for  Mathematics  and  Its  Applications.  Minneapolis,  mid- 1988.  Proceedings  of  the  Septem¬ 
ber  1985  conference  have  been  published  [2|. 

Returning  momentarily  to  the  context  of  Figure  1.  we  may  say  that  the  kind  of  application  mo¬ 
tivating  the  work  about  to  be  described  is  a  signal-processing  device,  operating  within  the  confines  of 
a  larger  system,  w'hich  has  to  repeatedly  and  rapidly  process  data  either  to  extract  information  (e.g.. 
estimate  on  an  unobservable  signal  embedded  in  noise)  or  to  eliminate  redundancy.  The  ambient  system 
might  then  use  the  processor  output  for  fire  control  or  direction  change  in  the  first  case,  or  for  data 
transmission  or  display  in  the  second.  In  all  such  applications,  the  main  question  concerns  the  trade¬ 
off  between  speed  (efficiency,  complexity,  etc.)  and  error.  The  methods  discussed  below  will  illustrate 
some  possibilities  for  accepting  small  increases  over  optimum  in  system  error  rates,  in  return  for  in¬ 
creases  in  computational  efficiency. 

To  facilitate  comparisons  between  competing  algorithms,  we  will  henceforth  adopt  a  strictly  neutral 
approach  toward  applications,  in  the  sense  that  the  discussion  will  remain  free  in  implementation  con¬ 
siderations  and  specific  technologies.  Hence,  the  algorithms  will  only  be  compared  numerically  and 
statistically  on  standardized  signal-processing  tasks,  such  as  data  compression  and  minimum  mean- 
square  (Wiener)  filtering. 

One  final  comment  concerning  the  coarse  structure  of  MFSP:  We  see  this  structure  as  triangular, 
with  “vertices"  consisting  of  Hilbert  space  theory,  probability  and  statistics,  and  abstract  harmonic 
analysis  on  groups.  We  must  immediately  acknowledge  that  this  perceived  structure  is  certainly  not  all 
inclusive,  but  is  intended  merely  to  give  an  approach  to  discerning  some  general  principles  and  meth¬ 
odology  in  a  virtually  infinite  amount  of  technical  detail. 

The  importance  of  Hilbert  spaces  is  that  these  serve  as  domains  of  operators,  which  in  turn  model 
various  transformations  of  one  signal  into  another.  Operation  theory  has  been  intensively  developed  in 
the  last  four  decades,  as  has  the  more  exotic  theory  of  operator  algebras,  although  the  most  fundamental 
results  (Spectral  theorem:  Bochner  and  Stone  theorems  for  positive  definite  functions  and  unitary  rep¬ 
resentations  of  the  real  line:  Peter-Weyl  theory  for  decomposing  unitary  representatives  of  compact 
groups;  von  Neumann  double  commutant  theorem)  date  back  to  the  late  1920s.  However,  most  of  the 
operators  w-ith  direct  signal-processing  application  are  of  elementary  structure  (e.g..  projections,  com¬ 
pact.  and  unitary  operators)  and  many  of  the  practical  difficulties  are  of  a  computational  nature  (e.g., 
computing  singular  value  and  spectral  decompositions,  adjoints  and  pseudoinverses,  approximation  by 
simpler  operators,  etc.). 


Since  real  data  are  always  accompanied  by  noise,  "what  is  measured  is  not  the  truth."  and. 
therefore,  "careful  signal  processors  must  be  statisticians"  |J.  Tukey).  so.  probability  and  statistics  constitute 
our  second  fundamental  "vertex."  Combined  with  Hilbert  space,  this  is  the  basic  setup  to  model  second- 
order  random  variables  and  processes,  (most  of)  the  theories  of  Gaussian  measures  and  abstract  Wiener 
spaces,  mean-square  prediction  and  filtering  (conditional  expectations),  etc.  Much  of  what  is  now  called 
linear  inverse  theory  (recovering  an  unobservable  signal  passed  through  some  channel  or  measuring 
device,  and  observed  in  noise)  can  also  be  constructed  in  this  context,  as  can  the  theory  of  optimal 
algorithms  and  information-based  complexity. 

This  statistical-linear  approach  to  signal  processing  is  by  now  quite  standard,  and  not  anything  that 
requires  further  commentary  here.  In  111.  I  attempted  to  make  the  case  for  harmonic  analysis  over  not 
necessarily  commutative  groups  as  a  third  foundational  "vertex"  of  signal  processing.  In  general,  groups 
permit  us  to  take  advantage  of  physical  or  temporal  symmetry  in  a  situation,  and  to  produce  orthogonal 
expansions  in  functions  that  respect  this  symmetry.  To  this  end.  the  vast  theory  of  unitary  representa¬ 
tions  is  available,  especially  the  Peter-Weyl  theory  for  compact  groups.  A  familiar  example  is  that  of 
stationary  stochastic  processes  over  locally  compact  commutative  groups.  In  a  different  direction.  I 
emphasized  in  [  1 1  the  role  of  finite  groups  as  offering  a  unified  approach  to  fast  unitary  transforms,  and 
it  is  this  theme  that  is  developed  below  . 

Finally,  there  are  two  points  not  made  in  1 1 1  that  should  be  inserted  here.  First,  as  mathematics 
advances,  we  are  starting  to  see  an  influx  of  novel  nonorthogonal  bases  and  expansion  devices  in 
(separable)  Hilbert  space.  We  have  in  mind  here,  inter  alia.  Riesz  bases  (actually  a  rather  classical 
notion,  defined  as  a  bounded  unconditional  basis,  equivalently,  an  orthonormal  basis  transformed  by  an 
automorphism)  |3|:  Dirac  bases  (a  kind  of  continuous  orthogonal  basis  in  the  context  of  "Sobolev 
triples")  [4];  and,  inevitably.  Dirac-Riesz  bases  |4|.  generalized  coherent  states,  and  "wavelets."  These 
last  two  concepts  are  very  much  associated  with  (square- integrable)  group  representations,  appearing  as 
discrete  subsets  of  certain  orbits  defined  by  an  element  of  associated  Hilbert  space  known  in  each  case 
as  the  “analyzing  wavelet."  This  kind  of  analysis,  developed  primarily  by  French  mathematicians  and 
physicists  (5.  6.  etc.)  is  rather  profound,  especially  when  dealing  with  non-unimodular  groups,  and  is 
likely  to  be  of  real  value  in  joint  time-frequency  approaches  to  signal  processing.  It  extends  the  classical 
work  of  Gabor  (wavelets)  and  Wigner  (distributions).  It  is  not  germane  to  give  a  more  precise  discussion 
here,  the  point  simply  being  yet  another  instance  of  group-theoretic  methodology  in  signal  processing. 

Second,  we  should  bear  in  mind  that  the  early  development  of  the  general  or  abstract  theory  of 
Hilbert  space  and  operators  thereon  was  greatly  motivated  by  attempts  to  formulate  a  rigorous  theory  of 
quantum  mechanics.  The  chief  architect  of  this  fusion  was.  of  course.  John  von  Neumann  |7|.  For 
example,  the  most  fundamental  of  operator  structure  theorems,  the  Spectral  theorem,  states  that  a  not 
necessarily  bounded  self-adjoint  operator  can  be  synthesized  from  a  family  of  orthogonal  projections, 
more  precisely,  from  a  projection-valued  measure  (or  resolution  of  the  identity)  on  the  real  line: 


The  quantum  mechanical  interpretation  of  this  formula  is  that  for  each  system  state,  as  described  by  a 
unit  vector  v  in  the  space  of  T.  the  probability  measure 

B->{E(B)x,a) 

describes  the  result  of  measuring  the  observable  associated  with  T.  The  projections  E(Bj  themselves 
correspond  to  yes-no  "questions"  about  the  observable,  so  that  it  at  least  becomes  plausible  to  expect 
some  sort  of  determination  of  such  operators  by  a  family  of  projections. 

Now.  the  only  point  to  be  made  here  is  that  because  of  this  close  connection  between  the  mathe¬ 
matical  foundations  of  quantum  mechanics  and  the  basic  theory  of  Hilbert  space,  and  because  of  our 
heavy  use  of  this  theory  in  our  own  work  on  MFSP  (not  so  much  an  issue  in  the  present  report),  we 
should  expect  some  interesting  connections  between  quantum  mechanics  and  signal  processing.  In  fact, 
such  connections  are  being  noticed  in  various  ways  by  several  researchers,  and  we  intend  to  discuss  some 
of  these  in  later  repons.  Hopefully,  an  eventual  unified  theory  of  fundamental  quantum  mechanics, 
pattern  recognition,  information  theory,  and  signal  processing  will  emerge  (insofar  as  this  is  possible!). 

1.2  THE  ROLE  OF  FINITE  GROUPS 

Let  us  now  move  directly  to  the  essence  of  our  subject:  the  uses  of  finite  groups  for  cenain  signal¬ 
processing  applications.  As  suggested  in  1 1 1.  a  distinction  is  to  be  made  between  the  roles  of  finite  and 
infinite  groups:  representations  of  and  analysis  on  the  latter  serve  to  define  structure  and  analytic- 
models.  such  as  expansions  and  transforms  of  idealized  signals,  while  analysis  on  the  latter  leads  to 
direct  computational  algorithms. 

As  discussed  in  [1|.  finite  nonsimple  groups  permit  a  unified  approach  to  fast  unitary  transforms 
and  discrete  suboptimal  filters.  Of  course,  there  are  other  nongroup-theoretic  approaches  to  fast  trans¬ 
forms.  and  references  were  provided  in  |1|  to  such  work  (References  10.  15.  and  47  of  Section  III 
therein).  Attempting  a  moderately  reliable  analogy,  we  may  say  that,  at  the  next  level  of  complexity, 
the  simple  groups  are  to  the  family  of  all  (finite)  groups  as  the  prime  numbers  are  to  the  set  of  all  positive 
integers.  In  fact,  each  integer  admits  an  essentially  unique  prime  factorization,  and  each  finite  group 
admits  a  composition  series  (that  is.  a  decreasing  chain  of  subgroups,  each  of  which  is  a  maximal  normal 
subgroup  of  its  predecessor);  the  associated  factor  groups  are  then  simple.  The  Jordan-Holder  theorem 
states  that  any  two  composition  series  for  a  particular  group  are  equivalent,  in  the  sense  that  the  simple 
groups  associated  with  the  two  series  can  be  paired  off  isomorphically.  Thus,  each  finite  group  deter¬ 
mines  a  unique  list  of  simple  groups.  And  the  converse  is.  in  a  certain  sense,  also  valid.  That  is.  if  we 
know  all  simple  groups  (which  is  the  case  now  |81).  then  all  finite  groups  can.  in  principle,  be  obtained 
by  use  of  the  Schreier  extension  technique.  Hence,  all  possible  groups  having  a  given  list  of  composition 
factors  can  be  constructed  |9).  although  not  very  efficiently.  In  particular,  when  this  theory  is  applied 
to  the  class  of  cyclic  groups,  we  can  recover  the  prime  factorization  of  integers. 

Now,  like  the  integers,  the  finite  groups  are  elementary  mathematical  objects  and  exist  indepen¬ 
dently  of  any  further  theories,  or  of  any  particular  signal-processing  application.  It  was  shown  in  1 1 1 
that  the  complexity  of  the  group  transform  F(.  associated  w  ith  a  particular  group  G.  of  order  A',  is  related 
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to  the  subgroup  structure  of  G,  especially  to  the  length  of  a  composition  series  for  G.  the  greater  this 
length,  the  greater  the  possibility  of  reduction  of  F  complexity.  The  best  cas.  has  FCJ.\)  complexity 
reduced  from  0(N~ )  to  0(N  log.,  N).  as  with  the  familiar  fast  Fourier  or  fast  Walsh  transforms  on  the 
cyclic  (resp..  dyadic)  groups  of  order  N  -  2". 

The  complexity  issue  for  various  group  transforms  and  their  inverses,  the  latter  not  necessarily 
being  group  transforms,  has  been  rather  extensively  examined  in  recent  literature  and  will  be  reviewed 
in  more  detail  in  Subsection  1.3.  On  the  other  hand,  this  is  not  the  only  issue  of  interest  in  regard  m 
signal-processing  applications.  There  is  the  further  problem  of  deciding  which  of  the  (often  many) 
groups  of  a  given  order  is  most  appropriate  to  a  particular  signal-processing  task.  That  is.  there  is  me 
fundamental  trade-off  between  speed,  in  the  sense  of  reduced  complexity,  and  the  error  incurred  as  the 
result  of  using  a  particular  group  transform  or  Titer  in  place  of  me  optimum  transform  or  filter. 

We  must  be  precise  here.  Since  we  are  interested  in  evaluating  and  comparing  the  performance 
ot  different  groups  in  a  signal -processing  context,  it  is  necessary  to  specify  four  variables,  namely: 

•  Data  dimension 

•  Group 

•  Task  (with  performance  measure) 

•  Signal  model. 

In  regard  to  the  first  two  variables  we  naturally  require  that 

data  dimension  =  order  ( group  t 

The  latter  three  variables  are  discussed  in  detail  in  Sections  2  through  4  below.  When  all  four  variables 
have  been  fixed  (as  indicated  in  the  later  sections),  then  a  definite  question  can  be  asked,  and  thus  group 
performance  in  various  situations  can  be  contrasted. 

In  further  regard  to  the  relation  between  the  first  two  variables,  we  have  two  alternatives.  First, 
we  can  fix  a  data  dimension  N  and  then  consider  some  (possibly  all)  groups  of  order  N.  This  becomes 
difficult  too  rapidly  as  N  increases,  especially  for  the  most  important  cases  of  highly  composite  N  (such 
as  N  =  16.  32.  48.  64.  96.  ...)  which  are,  as  already  noted,  exactly  the  cases  where  the  greatest  reduction 
in  complexity  is  to  be  expected.  Indeed,  for  the  values  of  N  just  mentioned  there  are.  respectively.  14. 
51.  52,  267.  230.  ...  distinct  group  isomorphism  types  of  that  order.  Hence,  a  second  approach  has 
proved  useful  —  to  study  the  asymptotic  performance  of  a  sequence  of  groups  of  orders  /V;.  as  k  — >  x> 
Familiar  examples  of  such  sequences  are  the  cyclic  groups  of  order  N  =  k.  the  dihedral  groups  of  order 
Aj  =  2k,  and  the  dyadic  groups  of  order  Aj,  =  2A.  Relevant  properties  of  these  groups,  and  of  a  fourth 
sequence,  will  be  discussed  in  Section  2  below. 

We  shall  look  at  three  canonical  signal-processing  tasks  (data  compression,  data  decorrelation,  and 
Wiener  tillering)  and  their  associated  performance  measures,  with  which  we  will  challenge  the  various 
groups.  Other  tasks  and  measures  could  have  been  selected  (a  rate-distortion  function,  an  entropv 
criterion,  resolution  ability,  etc.),  but  we  had  to  start,  and  stop,  somewhere  and  the  three  task,  that  we 
have  elected  to  work  with  are  familiar  a  id  neutral,  that  is.  free  of  application  or  hardware  specifics. 
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The  choice  and  analysis  of  various  performance  measures  and  their  interrelationships,  especially 
in  connection  with  compression  and  decorrelation,  are  a  little  more  subtle  than  might  be  expected  at  first 
and.  in  fact,  are  currently  under  further  investigation  by  the  author.  This  will  be  discussed  further  in 
Section  3  below  .  But.  we  can  note  here  that  all  the  performance  criteria  utilized  in  this  report  have  the 
virtue  of  being  rather  simple  functions  of  the  data  covariance  matrix.  (Actually,  our  signal  models  will 
be  restricted  so  that  all  components  of  each  signal  used  as  a  test  case  for  any  of  our  algorithms  will  have 
the  same  variance:  hence,  we  may  assume  that  all  covariance  matrices  are  of  correlation  type,  that  is. 
have  their  diagonal  entries  equal  to  unity.)  From  this  it  follows  that  a  choice  of  signal  model  may  be 
reduced  to  a  choice  of  a  random  correlation  matrix.  This  happens  to  not  be  a  particularly  obvious  thing 
to  do  and.  in  fact,  leads  to  a  host  of  interesting  problems  which  are  described  and.  in  some  cases,  solved 
in  two  other  reports  ( 1 0.  11].  A  summary  of  the  essential  points  of  this  work  is  given  in  Section  4  below. 

Let  us  now  return  once  more  to  the  genera!  role  of  finite  groups  in  discrete  signal  processing. 
Suppose  that  we  are  given  a  random  vector  v.  of  dimension  A,  with  E(x)  =  0.  For  any  particular  group 
G.  of  order  A',  we  can  think  about  the  fundamental  idea  in  two  equivalent  w'ays.  First  is  the  approach 
described  at  length  in  1 1 ).  namely  to  view  x  as  a  function  in  L'(G).  and  to  expand  v  in  the  special  ortho¬ 
normal  basis  defined  by  the  irreducible  representations  of  G.  The  mapping  F  which  assigns  to  ,v  its 

t 

Fourier  coefficients  with  respect  to  this  basis  is  exactly  the  group  transform  on  L'(G).  and  by  Parseval’s 
theorem  is  a  unitary  operator  there.  It  is  sometimes  more  useful  to  group  these  coefficients  together  in 
sets  of  size  d\  where  d  =  dimension  of  /lh  irreducible  representation  of  G,  so  that  FQ  can  be  considered 
to  map  L'tG)  onto  L:(G).  where  (G)  is  the  dual  object  of  G  consisting  of  all  the  irreducible  represen¬ 
tations  of  G. 

Alternatively,  we  maintain  the  natural  view  of  .v  as  an  element  of  the  A-dimensional  Hilbert  space 

NN).  and  think  of  G  as  a  symmetry  group  of  the  set  (1.  2 . A),  that  is.  as  a  subgroup  of  the  symmetric 

group  on  A  letters  (Cayley's  theorem).  In  turn.  G  may  be  realized  as  a  group  of  A  X  A  permutation 

matrices.  If  each  point  of  { 1.  2 . A)  is  considered  to  have  measure  I/A.  then  each  A  X  A  permutation 

matrix  can  be  considered  to  be  a  measure  preserving  transformation  of  the  trivial  measure  space  {1.2. 
....  A|.  So.  we  have  a  unitary  representation  of  G  on  the  space  t(N).  By  the  usual  general  theory,  this 
representation  can  be  decomposed  into  a  direct  sum  of  its  irreducible  components,  and  so  /‘(A)  corre¬ 
spondingly  splits  into  a  direct  sum  of  invariant  subspaces.  By  selecting  an  orthonormal  basis  for  each 
of  these  subspaces.  and  expanding  .v  in  the  resulting  coordinates,  we  obtain  a  decomposition  equivalent 
to  that  of  the  first  approach. 

In  the  case  of  a  nonabelian  G.  there  is  some  unavoidable  non-uniqueness  in  the  choice  of  these 
bases  that  goes  beyond  a  mere  rearrangement.  This  is  due  to  the  presence  of  at  least  one  irreducible  rep¬ 
resentation  of  dimension  >  2.  Hence,  the  group  transform  F c  is  only  unique  up  to  a  unitary  equivalence. 

For  a  given  data  dimension  A,  the  various  group  transforms  (defined  by  the  groups  of  order  A) 
formalize  the  initiative  notion  of  the  passage  from  physical  to  spectral  coordinate  domains.  Each  such 
transform  is  to  be  thought  of  as  a  potential  approximant  of  the  discrete  Karhunen-Loeve  transform 
(DKLT)  associated  with  a  particular  statistical  signal  class.  We  contemplate  using  such  approximants 
because  they  are  fast  to  compute  (depending  on  the  group,  and  relative  to  a  general  A  x  A  unitary 
transform),  and  because  we  may  not  know  the  exact  statistical  nature  of  our  data. 
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Similarly,  group  filters,  defined  abstractly  as  right  convolution  operators  on  L~(G).  are  to  be 
thought  of  as  potential  approximants  to  some  specific  operator  on  L~(G)  which  has  arisen  in  a  signal¬ 
processing  context.  The  example  that  we  emphasize  in  this  report  is  the  Wiener-filter  operator  which 
is  the  optimal  solution  (in  the  sense  of  minimum  mean-square  error)  to  the  problem  of  estimating  a 
random  signal  with  known  covariance  structure  that  has  been  observed  in  additive  white  noise  (uncor¬ 
related  with  the  signal).  By  passing  to  the  spectral  domain  via  the  group  transform,  we  can  obtain  a  fast 
algorithm  for  such  operators,  just  as  is  done  in  the  classical  case  of  computing  digital  filters  by  means 
ot  multiplying  the  corresponding  transfer  functions  in  the  (fast)  Fourier  domain.  Also  discussed  at  length 
in  1 1 1  was  the  problem  of  choosing  the  optimal  filter,  for  a  fixed  G.  to  best  approximate  a  particular 
operator  such  as  a  Wiener  filter.  As  usual,  we  are  interested  in  the  trade  between  the  reduced  complexity 
of  the  optimum  group  filter  (at  least  for  certain  groups,  as  discussed  below)  and  the  increase  in  error  over 
the  theoretical  optimum. 

1.3  RECENT  WORK  OF  OTHER  AUTHORS 

Before  delving  into  the  technical  details  of  our  own  work,  it  seems  useful  to  briefly  survey  some 
other  related  approaches  to  the  main  problems.  This  is  done  primarily  for  balance  and  completeness, 
and  we  will  concentrate  on  work  not  already  referenced  in  our  preceding  report  |1|. 

Let  us  first  consider  the  problem  of  approximating  the  DKLT  of  a  random  vector  v  with  E(x)  = 
0.  This  is  the  essential  issue  in  data  compression  and  decorrelation,  since  this  transform  has  the  desired 
optimality  properties  of  maximum  variance-packing  and  complete  decorrelation,  among  others  relevant 
to  feature  selection  for  pattern  recognition  |12h 

Since  the  spectrum  of  a  Hermitian  operator  such  as  the  covariance  operator  of  v  is  unique  only  up 
to  order  and  multiplicity,  the  DKLT  of  .v  is  not  uniquely  specified.  At  the  very  least,  namely  when  the 
eigenvalues  of  the  covariance  operator  are  distinct  (i.e..  have  multiplicity  one),  any  unitary  matrix  U  that 
diagonalizes  a  matrix  representation  of  this  operator  can  be  replaced  by  UQ,  where  Q  is  a  permutation 
matrix.  So.  when  we  talk  about  approximating  the  DKLT  we  shall  be  referring  to  a  unitary  matrix  U 
with  the  property  that  U*AU  is  “approximately  diagonal,”  where  A  is  a  covariance  matrix  of  x.  Of 
course,  this  phrase  needs  to  be  carefully  defined  and.  in  fact,  is  a  somewhat  subtle  problem,  as  already 
mentioned.  We  will  touch  on  this  in  Section  3. 

In  general,  we  can  discern  at  least  three  approaches  to  approximating  the  DKLT.  which  we  will 
term  "direct,”  "asymptotic."  and  “fixed  suboptimal.”  Let  us  say  a  few  words  about  each  of  these  in  turn. 
The  reader  should  keep  in  mind  that  there  are  always  two  conflicting  aspects  of  this  problem:  the 
accuracy  of  the  approximation,  and  its  complexity. 

The  direct  approach  deals  with  an  explicitly  known  covariance  matrix  A.  which  is  associated  with 
a  vector  of  samples  from  a  weakly  stationary  stochastic  process.  Hence.  A  is  also  a  Toeplitz  matrix.  A 
method  developed  by  A.  Solodovnikov  [13.  in  Russian)  leads  to  a  fast  algorithm  for  the  exact  DKLT 
of  such  a  matrix  in  the  case  where  its  dimension  N  has  the  form  2n.  The  basic  idea  is  to  factor  the  DKLT 
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matrix  as  the  product  of  n  sparse  matrices,  so  that  each  element  of  A  can  be  expressed  as  a  product  of 
just  one  element  of  each  factor.  The  proof  is  carried  out  by  induction  on  n.  beginning  with  the  trivial 
case  where  n  =  1 : 


'/  /■ 

'a  p 

~i  r 

—  2 

ra  +  p 

0 

1 

_P  G. 

I  -1 

0 

(J-p 

Notice  that  in  this  case  the  diagonalizing  matrix  is  just  the  2  X  2  Walsh-Hadamard  (W-H)  transform. 
In  the  general  induction  step  the  2"  X  2"  W-H  transform  is  used,  along  with  a  permutation,  to  reduce 
A  to  block  diagonal  form  to  which,  in  turn,  the  induction  hypothesis  can  be  applied. 

A  second  direct  approach  has  been  suggested  in  the  Russian  literature  by  I.  Kaporin  [14).  under 
the  alternate  assumption  that  the  covariance  matrix  A  is  tridiagonal.  This  is  of  course  a  significant 
restriction  since,  otherwise,  to  first  put  A  in  this  form  (say  by  Householder  reduction)  requires  O(N') 
operations.  But  when  A  has  this  special  form,  recursive  linear  computations  can  be  defined  to  yield  an 
0(N  log1  N)  operations  count  in  affecting  the  DKLT  of  a  vector. 

Next  we  turn  to  the  asymptotic  approach.  Once  again  we  consider  samples  (.r . i  v)  drawn  from 

a  weakly  stationary  stochastic  process,  but  without  a  definite  bound  on  N.  The  corresponding  covariance 
martrices  will  again  be  Toeplitz.  Early  treatments  of  this  topic  |!5.  16)  proceeded  by  making  some 
assumption  about  the  rate  of  decrease  of  the  autocorrelation  function  of  the  process,  and  then  showing 
that  some  particular  sequence  of  unitary  transforms  of  interest,  such  as  the  sequence  of  iV-point  DFTs. 
would  asymptotically  decorrelate  the  N  x  N  covariance  matrices.  Depending  on  the  precise  nature  of 
the  assumptions,  this  last  phase  could  mean  either  that  some  fixed  correlation  between  two  of  the 
transformed  components  of  {xt . xN)  tends  to  0,  or  that  some  composite  measure  of  all  these  corre¬ 

lations  does  so. 

More  recently,  M.  Unser  (17)  has  shown  that  several  popular  fast  unitary  transforms  (DCT.  DFT. 
DOFT,  ...)  achieve  asymptotically  the  effect  of  the  DKLT.  provided  such  effects  are  carefully  described 
by  appropriate  performance  measures,  namely  as  separable  convex  or  concave  functions  of  the  variances 
of  the  transformed  samples.  The  analysis  nicely  makes  use  of  a  classical  theorem  on  the  asymptotic 
distribution  of  the  eigenvalues  of  Hermitian  Toeplitz  matrices,  due  to  Grenander  and  Szego.  as  adapted 
by  Gray  1 18,  Thm.  2.3).  To  my  knowledge,  this  result  of  Unser's  determination  is  the  most  general  result 
presently  known  on  asymptotic  approximation  of  the  DKLT  for  Toeplitz  covariance  matrices. 

Lastly,  we  consider  the  fixed  suboptimal  approach.  This  is  appropriate  in  all  cases  not  yet  covered, 
that  is.  the  problem  size  is  fixed  but  the  exact  signal  covariance  is  not  known  (or,  if  known,  is  not 
Toeplitz).  In  this  case,  we  can  attempt  to  measure  the  performance  of  some  fixed  /V-dimensional  unitary 
transform  over  a  class  of  N  X  N  covariance  matrices.  By  doing  this  repeatedly,  we  could  hope  to  rank 
the  transforms  relative  to  the  fixed  performance  measure  and  covariance  class.  Of  course,  they  could 
also  be  ranked  relative  to  computational  complexity,  which  can  be  defined  independently  of  any  signal¬ 
processing  task. 

Along  with  M.  Karpovsky  and  E.  Trachtenberg  1 19.  20).  I  also  have  adopted  this  sort  of  approach. 
All  this  activity  involves  the  notion  of  transforms  (and  filters)  defined  by  finite  groups,  the  general  theory 
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of  which  was  set  down  at  length  in  [1 ].  Actually,  the  most  recent  work  of  Trachtenberg  1 2 1 .  221  utilizes 
only  abelian  groups.  In  [21].  for  example,  such  a  group  G  of  order  N  is  given,  and  then  a  class  of  so- 
called  Frobenius  group  matrices  is  introduced.  Such  a  matrix  A  is  defined  by  specifying  first  a  scalar 
function  /  on  G  and  then  defining 

*»=/(&©&■) 

where  G  =  (,t>.:  /  =  0.  1 . N  -  1  );  A  turns  out  to  be  unitary  exactly  when  f(g)  fix'1 )  =1.  for  all 

g  e  G.  The  associated  matrix  transforms  can  be  computed  by  fast  algorithms  in  many  ways,  depending 
on  how  G  is  factored  into  a  product  of  (cyclic)  subgroups,  and  on  the  support  of/.  Any  remaining 
degrees  of  freedom  in  the  choice  of/ can  be  utilized  to  relate  the  performance  of  the  /4-transform  to  a 
given  signal-processing  task,  such  as  DKLT  approximation.  In  this  fashion,  a  whole  spectrum  of  trades 
between  complexity  and  error  can  be  attained. 

Our  final  comments  in  this  section  pertain  to  the  issue  of  group  transform  complexity.  This  issue 
was  discussed  at  some  length  in  [11.  primarily  for  the  case  of  abelian  groups.  For  such  groups  a 
satisfactory  theory  exists,  due  to  the  facts  that  all  irreducible  representations  are  one-dimensional 
(characters)  and  that  the  dual  object  has  a  group  structure  (dual  group),  so  that  the  inverse  transform  is 
again  a  group  transform.  For  nonabelian  groups,  the  only  result  1  knew  of  at  the  time  1 1 1  was  written 
(due  to  Karpovsky  [23))  quantified  the  complexity  of  the  direct  and  inverse  group  transform  as  0|ord(G) 
X  ordfG.ll,  when  the  group  G  can  be  decomposed  into  a  direct  product  of  groups  G(.  This  same  result 
was  established  independently  by  M.  Atkinson  [24).  More  recently,  this  has  been  investigated  by 
T.  Beth  [25]  for  the  much  more  general  case  of  solvable  groups.  For  such  a  group  G.  there  is  a 
composition  series 

{e}  =  G,.+/  c  G,.  c  ...  c  G,  =G 

with  each  Gk  a  normal  subgroup  in  Gk  r  of  prime  index  p  .  Then,  the  computational  complexity  of  each 
transform  can  be  bounded  by  ord  G(X  pk)  +  0\ord(G)"2)  steps,  where  u  <  3  is  the  exponent  for  which 
N  x  N  matrices  can  be  multiplied  in  0(N")  arithmetical  steps. 

Of  course,  not  all  groups  of  potential  interest  in  statistics  and  engineering  are  solvable.  Perhaps 
the  foremost  example  is  the  symmetric  group  Sn  on  n  letters.  For  this  group,  a  special  analysis  of 
transform  complexity  has  been  made  by  Diaconis  and  Rockmore  [26],  who  show  the  possibility  of 
reducing  the  usual  0[(n!)2]  count  to  n(n!)u/~  . 
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2.  THE  GROUPS 


Recalling  the  preliminary  discussion  in  Section  1.2.  we  now  indicate  precisely  the  four  classes  of 
finite  groups  used  in  the  present  work.  These  are  the  cyclic  groups,  the  dyadic  groups,  the  dihedral 
groups,  and,  for  lack  of  a  more  traditional  terminology,  the  “Russian  groups."  There  is  a  cyclic  group 
C  of  all  orders,  while  the  other  types  exist  only  for  restricted  orders  as  indicated  parenthetically  in 
Table  1. 


TABLE  1 

Groups  Used  in  This  Study 


Commutative 

Noncommutative 

Cyclic  (n) 

Dyadic  (2n) 

Dihedral  ( 2n ) 

“Russian”  ( 2n ) 

Three  of  these  classes  of  groups  have  a  geometric  significance,  while  the  Russian  groups  are  more 
abstract,  being  in  general  only  defined  in  terms  of  generators  and  relations.  Thus,  the  cyclic  group  C n 
can  be  identified  with  the  group  of  counterclockwise  rotations  of  the  plane  through  an  angle  of  2n/n 
radians.  The  dyadic  group  D  is  the  set  of  all  /t-dimensional  binary  (0,1)  vectors,  with  componentwise 
addition  mod  2.  Thus,  equivalently,  D  =C,  X  ...  X  C, ,  n  times,  and  we  can  think  of  Z)  as  constituted 
of  the  vertices  of  the  unit  hypercube  in  R".  The  dihedral  group  Di  is  the  group  of  symmetries  of  the 
regular  n-gon.  It  is  generated  by  a  rotation  a  and  a  reflection  b  so  that  a"  =  b~  =  identity,  and  aba  =  b. 
Thus.  Din  contains  C  (  as  a  normal  subgroup  and  is  of  order  2n.  It  is  the  semidirect  product  of  C  and 
C,,  and  is  hence  the  simplest  example  of  a  noncommutative  group. 

Finally,  the  “Russian  groups"  form  a  sequence  { BG/  }  of  groups  of  order  2"  defined  as  follows. 
If  n  is  even,  BG n  =  BG n  f  x  Cv  If  n  =  2k  +  1,  then  BGn  has  generators  ( a ,  br  ....  b  |  obeying  the 
relations 

ah  =  ha  , 

/  / 

b,  ,  /?,  =  b^  b.  .a  ,  i  =  1 . k 

2i-f  2i  2i  2i-l 

b  b  =  h  h ,  otherwise 

ii  ii 

It  happens  that  BG t  =  Dir  but  otherwise  the  Russian  groups  are  not  dihedral. 


The  Russian  groups  were  introduced  into  the  signal-processing  literature  by  Berman  and  Grushko 
(27).  The  major  thrust  of  this  article  was  to  prove  that  the  computational  complexity  of  the  associated 
group  transform  is  0.75  N  log.,  N  +  o(N),  as  ordfBG  )  — »  ^c.  Thus,  the  effort  to  compute  the  BG 
transform  is  about  1/2  (resp.  3/4)  that  of  the  familiar  FFT  (resp.  W-HT).  The  proof  is  based  on  a  careful 
analysis  of  the  unitary  dual  BG  ;  this  set  contains  just  characters  and  either  one  or  two  irreducible 
representations  of  dimension  2*  according  as  ord (BG  )  =  2:k+l  or  2:k+:.  The  commutator  subgroup  of 
BG/t  is  the  group  C,.  and  its  abelianization  BG JC ,  (the  largest  commutative  quotient  group  of  BG  )  is 
the  dyadic  group  D  ;. 

We  note  that  the  authors  of  [27 J  did  not  attempt  to  evaluate  the  role  that  their  groups  might  play 
in  signal-processing  applications  beyond  that  of  reduced  complexity  of  the  associated  transform.  Looking 
ahead  slightly,  we  will  include  such  an  evaluation  below  and  eventually  deduce  that,  in  terms  of  standard 
performance  measures  for  the  SP  tasks  that  we  consider  (the  Russian  groups),  performance  is  comparable 
to  that  of  the  dyadic  groups. 

As  a  matter  of  specificity  we  record  next  the  general  form  of  the  unitary  matrix  of  the  group 
transform  of  BG  .  n  =  2:k+l ,  for  a  particular  choice  and  ordering  of  the  basis  vectors  in  the  spaces  of 
the  various  irreducible  representations.  Let 


H-,  = 


I  1 
1  -1 


and  Hm  =Hni_/  ®  f/i  .  for  ni  >  3 


These  are  the  well-known  Hadamard  matrices.  Then,  the  group  transform  matrix  of  BG  is.  up  to  a 
normalizing  constant. 


cH 


-cH 


cH 


-cH 


cH 


“)k 


-cH 


where  r  =  2  '. 

Let  us  use  the  phrase  "superfast  group"  for  any  group  whose  associated  group  transform  can  be 
computed  "significantly  faster"  than  the  usual  FFT.  Without  attempting  to  be  overly  precise  about  the 
meaning  of  this  phrase,  let  us  agree  that  the  groups  {BGJ  are  at  least  asymptotically  superfast.  Open 
questions  now  abound:  Are  there  other  superfast  groups?  How  can  they  be  recognized  or  constructed? 
And  what  is  the  limit  of  complexity  reduction  that  is  possible?  There  are  also  the  important  ancillary 
questions  of  such  groups'  value,  if  they  exist,  for  particular  SP  tasks. 
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We  have  already  remarked  on  the  rapid  growth  of  the  number/^  of  distinct  groups  of  order  k  when 
k  is  a  highly  composite  integer  of  the  form  k  =  2"  or  k  =  3  x  2"  (cf.  Subsection  1.2).  Table  2  illustrates 
this  remark  even  more  dramatically  for  the  prime  power  orders  k  =  2". 


TABLE  2 

Groups  of  Order  2n 


k 

. 

Nk 

16 

14 

32 

51 

64 

267 

128 

2,328 

256 

56,092 

512 

>8,400,000 

Since  groups  of  order  k  -  2"  are  those  of  order  k  for  which  maximum  complexity  reduction  is  to 
be  expected,  we  must  concentrate  on  these.  Obviously,  the  numbers  in  Table  2  preclude  a  case-by-case 
analysis,  so  that  special  constructions,  as  were  done  for  the  Russian  groups  case,  will  be  required.  On 
the  other  hand,  the  sheer  size  of  the  numbers  Nk  in  Table  2  suggests  a  degree  of  a  priori  probability  for 
the  existence  of  other  classes  of  superfast  groups.  (We  might  term  this  argument  one  of  “ample 
opportunity,"  faintly  analogous  to  that  for  the  existence  of  extraterrestrial  life  based  on  the  large  number 
of  stars  in  our  galaxy.)  There  is  a  field  known  as  computational  group  theory  whose  techniques  will 
likely  be  of  use  in  this  matter. 
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3.  GENERIC  SIGNAL-PROCESSING  TASKS 
AND  THEIR  PERFORMANCE  MEASURES 


We  now  want  to  begin  consideration  of  how  the  various  group  transforms  and  optimal  group  filters 
can  be  compared.  The  most  direct  approach  is  to  simply  let  them  operate  in  controlled  statistical 
environments  and  monitor  the  results  from  various  SP  tasks.  Specification  of  the  latter  is  our  present 
concern.  In  order  not  to  appeal  to  any  possible  bias  on  the  reader's  part,  we  will  limit  attention  to  generic 
SP  tasks,  without  regard  to  hardware  implementation  or  specific  technology  application. 

Proceeding  from  this  intent,  we  next  list  several  such  SP  tasks:  others  will,  of  course,  occur  to 
knowledgeable  readers: 

Compression 

Decorrelation 

Filtering  (estimation) 

Detection 

Discrimination 

Pattern  recognition 

Resolution 

Simulation. 

Clearly  some  of  these  tasks,  such  as  filtering  and  pattern  recognition,  are  very  general,  and  considerable 
further  specification  is  necessary  to  arrive  at  a  well-defined  problem  to  which  our  particular  group- 
theoretic  techniques  can  be  applied. 

For  present  purposes,  we  have  decided  to  restrict  attention  to  the  First  three  applications  listed 
above.  Each  of  these  is  discussed  in  detail  in  the  earlier  report  1 1 );  relevant  portions  of  that  discussion 
are  reviewed  next.  The  key  point  is  that  in  each  case  a  natural  performance  measure  is  given  as  a 
function  of  the  signal  covariance.  This  fact  permits  us  to  neatly  generate  test  signals,  as  we  will  see  in 
the  next  section. 

The  compression  and  decorrelation  tasks  can  be  treated  from  a  common  viewpoint,  which  is  called 
"transform  coding"  in  the  engineering  literature.  A  block  .v  of  N  samples  is  formed  from  a  signal  of 
constant  variance  (perhaps,  but  not  necessarily,  stationary),  and  transformed  by  a  unitary  matrix  (J: 
v  =  Ux.  The  covariance  matrices  of  .v  and  y  are  related  by 

I  V=UIXU*  .  (3.1) 

In  a  complete  SP  system,  the  transformed  samples  {\J  might  now  be  individually  quantized  prior  to 
transmission  over  some  channel  and  eventual  reconstruction  at  the  receiver.  However,  here  we  are 
primarily  interested  in  the  statistical  behavior  of  the  original  or  "physical"  coordinates  {xj  vis-a-vis  that 
of  the  transformed  or  “spectral"  coordinates  {yj. 
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In  general  we  may  say  that  compression  is  determined  by  some  measure  of  the  size  of  the  diagonal 
of  —  ,  and  decorrelation  by  some  measure  of  the  off-diagonal  entries.  A  more  elegant  approach  would 
in  fact  use  the  same  measure  for  decorrelation  as  for  compression,  but  this  is  a  matter  still  under 
investigation.  So.  we  will  stay  with  the  more  ad  hoc  approach  for  now. 

For  notational  simplicity,  let  us  set  P  =  and  Q  =  2  .  We  will  assume  that  P  is  a  correlation 
matrix,  so  that  P  =  var  (xj  =  1.  /  =  1 . N.  It  then  follows  that 

tr  (Q)  =  tr(P)  =  N 

In  this  study,  we  choose  to  measure  compression  either  as  a  plot  of  the  fraction  of  total  variance 
in  the  largest  m  spectral  coordinates  vs  m,  1  <  m  <  N.  or  by  the  single  number  y which  is  the  geometric 
mean  of  diag  (Q).  This  measure  y ^  is  interesting  for  a  couple  of  reasons.  First,  its  logarithm 

•°g  ~  £  lo2  {(in ) 

is  a  simple  example  of  a  concave  symmetric  function  of  the  {(jj-  As  such,  it  is  part  of  a  more  general 
theory  of  optimizing  compression  by  minimizing  concave  (resp..  maximizing  convex)  symmetric  func¬ 
tions  of  the  spectral  variances.  Second,  its  use  is  motivated  by  work  in  information  and  rate  distortion 
theory;  in  particular,  the  result  that  the  average  distortion  from  an  optimal  bit  assignment  in  an  indepen¬ 
dent  quantization  of  {yj,  subject  to  the  constraint  of  a  given  average  bit  rate,  is  proportional  to  y  [28|. 

As  general  facts  about  y  ,  we  note  that  yQ  <  1  as  long  a«  P  is  a  correlation  matrix,  and  that  min 
y^  is  achieved  when  the  transformed  matrix  U  is  a  discrete  Karhunen-Loeve  matrix  for  P.  It  can  also 
be  shown  that  can  be  arbitrarily  near  to  0  for  some  correlation  matrix  P. 

To  measure  the  amount  of  decorrelation  achieved  by  the  transform  U,  we  will  adopt  the  ad  hoc 
quantity 

jjm)  (3.2a) 

Ill/’lilJ 


where 


Illi4lll=  (3.2b) 

»j 

This  quantity  normally  ranges  between  0  and  100,  with  100  meaning  that  the  spectral  components  have 
become  decorrelated,  and  0  meaning  no  change  in  overall  correlation  structure.  However,  as  we  will 
see.  it  can  happen  that  111(2111  >  IIIPIH  for  some  ill-chosen  transform  U.  In  such  a  case,  the  above 
performance  measure  will  become  negative,  indicating  that  the  transform  operation  has  increased  the 
overall  correlation  between  data  coordinates. 
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We  turn  now  to  a  precise  formulation  of  the  filtering  task.  The  simplest  situation  is  classical  mean 
square  (Wiener)  filtering  of  a  signal  observed  in  uncorrelated  white  noise.  That  is,  we  are  given 
/V-dimensional  data 

,v  =  s  +  w 

with  E  (s  n ')  =  0.  and  noise  covariance  matrix  =  A/  (A  >  0,  /  =  /V-dimensional  identity  matrix).  The 
problem  is  to  form  a  linear  estimate 

i=M-v) 

to  minimize  the  error 

E{\\s-s\\:)=e(L)  =  tr[{l-L)Zs]  .  (3.3) 

The  solution  W  =  arg  min  c  (•)  is  the  standard  (discrete)  Wiener  filter 

IV  =  Xv  (Iv  +  A/ )~'  .  (3.4) 

The  computation  s  =  W’.v  has  no  particularly  efficient  algorithm  and,  further,  exact  knowledge  of 
the  second-order  statistics  of  the  data  is  required  before  IV  can  even  be  written  down.  For  these  reasons 
we  proposed,  and  discussed  at  some  length  in  [11.  the  replacement  of  W  by  fast  suboptimal  group  filters. 
These  latter  are  special  kinds  of  linear  transformations  of  N-space.  having  a  certain  internal  symmetry 
defined  by  some  group.  Specifically,  they  are  right  convolution  operators  on  L2  (G),  ord(G)  =  N  and, 
as  such,  they  are  unitarily  equivalent,  via  the  group  transform  associated  with  G.  to  a  multiplication 
operator  on  L~  (G),  where  G  is  the  unitary  dual.  The  basic  point  is  that  there  will  be  a  fast  algorithm 
for  their  evaluation  whenever  there  is  one  for  the  associated  group  transform. 

If  T>(G)  denotes  the  /V-dimensional  operator  algebra  of  all  group  filters  on  G,  then  the  optimal 
group  filter  for  our  filtering  problem  is 

Tw  =  argmin  £|lls-7'(.v)ll‘?|  ,  (3.5) 

taken  over  all  T  e  <P(G).  Of  course,  this  depends  on  the  choice  of  the  group  G.  Varying  G  subject  to 
ord(G)  =  N  will  yield  optimal  filters  with  a  variety  of  mean-square  errors  and  computational  complexi¬ 
ties.  In  particular,  we  have 

e(Tw)  =  tr[(l-Tw)ls]  ,  (3.6) 


which  can  be  compared  with  the  benchmark  formula  for  e(W). 

Note  that  all  these  formulas  implicitly  involve  the  positive  parameter  A.  which  is  related  to  a 
natural  signal-to-noise  ratio  (SNR)  expression  by 


SNR  = 


"(£)_  / 
A  N  A 


(3.7) 


if  we  assumed  that  i  is  a  correlation  matrix,  so  that  nil,  )  =  N.  In  this  study,  we  have  considered  SNR 
values  between  ±10  dB. 
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All  the  preceding  formulas  can.  of  course,  be  generalized  to  situations  where  the  signal  s  has  the 
form  .v  =  Ah,  where  h  is  a  random  element  of  some  Hilbert  space  H.  A  is  a  linear  operator  on  H  of  finite 
rank,  and  the  noise  vv  is  colored.  The  operator  A  represents  the  effect  of  some  kind  of  measuring  device, 
or  communications  channel  perhaps,  on  the  unknown  element  h.  The  formation  of  optimal  linear 
estimates  h  =  L(.\)  in  this  context  is  the  subject  of  linear  inverse  theory,  but  this  added  generality  exceeds 
our  present  need  of  a  simple  linear  filtering  task  on  which  to  test  some  new  algorithms. 

As  noted  previously  in  Subsection  1.2.  all  performance  measures  defined  above  have  the  virtue  of 
being  explicit  functions  of  the  data  covariance  matrix  2.  or  the  signal  covariance  matrix  !l  .  We  will 
put  this  feature  to  good  use  in  the  next  section,  by  first  defining  several  signal  classes  in  terms  of  their 
covariance  structure,  and  then  indicating  how  representatives  of  these  various  classes  can  be  picked  at 
random,  for  purposes  of  a  simulation  to  test  the  efficacy  of  our  new  algorithms. 


18 


5.  COMPUTATIONAL  ASPECTS 


As  a  first  attempt  to  assess  the  relative  merits  of  our  various  algorithms,  we  decided  to  simply  let 
them  compete  with  one  another  in  the  two  particular  statistical  environments  just  discussed.  This  is  in 
keeping  with  the  general  approach  of  using  the  computer  as  a  kind  of  "laboratory."  either  for  the  testing 
of  conjectures  or  for  the  generation  of  data,  from  which  we  might  attempt  to  evolve  explanatory  theories. 

A  FORTRAN  program  was  prepared  to  carry  out  the  necessary  calculations.  First,  a  data  dimen¬ 
sion  N  is  selected.  In  this  work.  N  was  always  taken  as  2".  n  <  6.  Next,  a  signal  model  class  is  selected 
from  AR(p),  nonparametric  stationary,  nonstationary.  as  discussed  in  the  preceding  section.  Then.  50  N 
X  N  random  correlation  matrices  are  generated  from  this  signal  class;  these  provide  the  test  cases  on 
which  our  evaluations  will  be  based.  Next,  one  of  the  four  groups  of  Section  2  is  specified,  and  a 
corresponding  group  transform  matrix  U  is  calculated,  representing  the  group  transform  for  a  particular 
choice  of  basis.  The  minimal  information  for  this  is  the  set  of  values  of  a  complete  set  ol  irreducible 
representations  of  the  group  taken  on  a  set  of  generators. 

At  this  point,  we  must  decide  on  the  type  of  signal-processing  task  on  which  to  test  our  group 
transform.  If  the  task  is  to  be  compression  or  decorrelation,  we  simply  transform  each  of  our  correlation 
matrices  according  to  Equation  (3.1)  and  then  compute  the  relevant  performance  measure.  This  is. 
respectively,  the  geometric  mean  of  the  diagonal  of  the  transformed  correlation  matrix,  or  the  per¬ 
centage  decrease  in  overall  correlation,  defined  by  Equation  (3.2).  On  the  other  hand,  if  the  task  is  to 
be  Wiener  filtering,  then  a  little  more  work  is  required.  We  first  supply  as  an  added  input  a  signal-to¬ 
rtoise  figure.  SNR,  defined  by  Equation  (3.7).  Then,  we  must  compute  the  Wiener  filter  error  from 
Equations  (3.3)  and  (3.4),  and  the  optimal  group  filter  error  from  Equations  (3.5)  and  (3.6).  Carrying  out 
the  minimization  in  Equation  (3.5)  is  not  an  entirely  trivial  task,  and  is  described  in  Section  3.5  of  1 1 1. 

The  result  is  ihat  a  system  Ac  =  b  of  linear  equations  must  be  solved  to  yield  coefficients  (r; . ) 

in  the  expression 

th=X  <•,/?(/',) 

/ 

where  /i(  runs  through  all  the  group  elements,  and  R(-)  is  the  regular  representation  of  the  group.  The 
matrix  A  here  has  entries  of  the  form 

aij  =  (p+sm~l  l,R(h]’ h,  fl 

where  P  is  the  input  correlation  matrix,  and  from  this  form  it  follows  that  A  is  positive  definite,  so  the 
corresponding  linear  system  is  not  troublesome  numerically. 

Finally,  our  computer  program  must  display  the  results  so  obtained  in  a  visually  compelling 
format,  and  with  useful  summary  statistics.  For  the  tasks  of  decorrelation  and  filtering,  we  have  elected 
to  display  the  results  in  scatterplot  form,  for  one  pair  of  groups  at  a  time,  so  that  the  two  groups  can 
be  directly  compared  with  one  another.  Fixing  the  pair  of  groups,  say  (G/.G,),  a  two-dimensional  point 


(/?; ./>-,)  is  plotted  for  each  sample  correlation  matrix.  In  the  decorrelation  case.  p(  is  the  performance 
measure  of  Equation  (3.2);  while  in  the  filtering  case,  p  is  the  percentage  increase  of  the  optimal 
G -filter  error  over  the  Wiener-filter  error.  So,  in  the  decorrelation  case.  p;  >  p,  means  that  G;  gave  a 
better  result  than  group  G,  (i.e..  achieved  a  higher  degree  of  decorrelation),  while  in  the  filtering  case, 
p  >  p,  means  that  group  Gy  gave  a  worse  result  than  group  G,  (since  its  optimal  group  filter  had  a  larger 
error  than  that  of  the  other  group). 

For  the  task  of  data  compression,  we  simply  plot  the  average  value  of  fo<  each  group,  along 
with  two  standard  deviation  error  bars.  Here,  the  smaller  this  average  and  the  narrower  the  error  bars, 
the  better  the  group  has  performed.  Note  that  we  have  also  included  the  corresponding  result  tor  the 
discrete  cosine  transform  (DCT).  even  though  it  is  not  a  group  filter,  because  of  its  popularity  in 
engineering  applications. 

A  flowchart  of  the  program  operation  is  presented  on  the  following  page. 
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6.  RESULTS  AND  CONCLUSIONS 


We  vv  ill  now  illustrate  the  operation  of  the  program  just  described  by  presenting  several  examples 
of  output.  Most  of  this  output  will  be  for  the  case  of  data  dimension  =  32.  as  being  typical  of  the 
dimensions  we  considered  (X  <  N  <  64).  After  output  pertaining  to  each  of  the  three  signal  tasks  under 
consideration,  we  will  offer  some  conclusions,  based  on  not  only  the  displayed  results  but  all  results 
relevant  to  that  particular  task. 

Figures  2  and  3  show  the  ability  of  the  various  transforms  to  compress  data  vectors,  with  the 
geometric  mean  of  the  variances  of  the  transformed  vector  as  a  xperformance  criterion.  We  see  that  for 
samples  from  stationary  data,  the  DFT  (cyclic  group  transform)  and  the  DCT  are  about  equally  effective, 
and  close  to  the  theoretical  optimum  of  the  DKLT.  The  other  transforms  are  marginally  less  effective. 
However,  for  the  nonstationary  data  samples,  very  little  compression  is  evident  by  any  of  the  transforms, 
even  though  the  DKLT  achieves  a  significant  degree  of  compression. 
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STATIONARY:  DATA  POINTS  =  50.  N  =  32  (Error  Bars  Are 
in  Increments  of  1  Standard  Deviation) 


Figure  2.  Data  ((impression  summary  (average  geometric  mean). 
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Figure  3.  Data  compression  summary  (average  geometric  mean). 


The  empirical  evidence  presented  here,  that  stationary  data  are  highly  compressed  (relative  to  the 
optimum)  by  both  the  DFT  and  DCT,  will  not  come  as  any  surprise  to  readers  familiar  with  the  transform 
coding  literature.  Recall,  for  example,  the  work  of  Unser  |17),  wherein  it  is  shown  that  the  DFT  and 
DCT  (among  others)  are  asymptotically  equivalent  to  the  DKLT.  for  data  derived  from  (weakly)  station¬ 
ary  processes. 

Perhaps  the  result  for  nonstationary  data  will  not  be  surprising  either,  given  the  total  lack  of 
structure  in  the  correlation  matrices.  In  the  case  of  RCMs  with  random  spectra,  it  can  be  proved  that, 
asymptotically,  the  average  geometric  mean  associated  with  the  DKLT  is  c'12  0.61  ....  while  that 

value  associated  with  any  of  the  transforms  seems  to  tend  to  1.  Empirically,  the  discrete  W-HT  does 
best,  but  not  well  enough  to  be  of  any  practical  interest.  The  situation  for  Gram  RCMs  is.  in  a  sense, 
even  more  striking,  as  the  geometric  mean  figure  associated  with  the  DKLT  seems  to  be  asymptotic  to 
a  value  =  0.36  .... 

These  results  tend  to  suggest  that,  given  a  correlation  matrix  P.  the  functional  y  defined  on  the 
unitary  group  by 

yQ{U)  =  GM[d\ag{UPU*)] 
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which  has  a  maximum  value  of  1  at  U  =  /.  has  a  steep  and  sharp  minimum  at  U  =  DKLT  (=  matrix 
of  normalized  eigenvectors  of  P;  the  fact  that  the  minimum  occurs  at  U  is  not  at  question,  but  rather 
the  behavior  of  y  near  this  minimum).  From  this  perspective,  it  appears  that  other  methods  of  approxi¬ 
mating  the  DKLT.  for  general  non-Toeplitz  correlation  matrices,  along  the  lines  already  mentioned  in 
Section  1.3  are  needed. 

Although  similiar  conclusions  might  be  expected  when  we  turn  to  the  second  task,  decorrelation, 
there  is  now  an  interesting  twist.  Recall  that  we  are  using  the  ad  hoc  measure  of  decorrelation  defined 
by  Equation  (3.2).  This  is  just  the  average  magnitude  of  the  off-diagonal  entries  of  the  covariance  matrix 
Q  =  UPU*.  The  larger  this  value,  the  more  decorrelation  has  taken  place  under  the  action  of  the 
transform  U.  Figure  4  shows  a  typical  result  for  stationary  data:  both  the  W-HT  and  the  DFT  achieve 
a  significant  level  of  decorrelation,  with  the  latter  slightly  better  on  average  (64  vs  58  percent).  For  such 
data  the  DCT  and  DFT  are  essentially  comparable,  with  a  slight  advantage  to  the  former  (73  vs  64  per¬ 
cent):  the  W-HT  more  decisively  outperforms  the  Russian  groups  transform  (58  vs  40  percent). 
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Figure  4 .  Decorrelation  data  I percent )  —  stationary. 


Now  let  us  consider  nonstationary  data.  We  first  report  that  the  DCT.  W-HT.  and  Russian  groups 
transform  all  perform  essentially  identically.  However,  the  DFT  actually  produces,  on  average,  a  nega¬ 
tive  degree  of  decorrelation,  indicating  an  increase  in  overall  correlation.  This  is  shown  in  Figures  5 
through  7  for  the  cases  of  data  dimension  =  16.  32.  and  64.  respectively. 
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igure  3.  Decorrelation  data  (percent)  —  nonslationary. 
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Figure  6.  Decorrelation  data  (percent)  —  nonstationary. 
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Figure  7.  Decorrelation  data  (percent)  —  nonstationary. 


From  such  results  we  conclude  that,  as  with  the  case  of  data  compression,  the  DFT/DCT  should 
be  used  with  stationary  data.  However,  for  nonstationary  data  the  DFT  should  definitely  not  be  used. 
The  other  transforms  all  achieve  at  least  a  slight  decorrelation  although,  as  before,  not  enough  to  be  of 
practical  interest.  It  appears  that  the  DFT,  so  well  adapted  to  stationary  data,  is  a  singularly  bad 
transform  otherwise.  We  even  ran  some  experiments  with  random  unitary  matrices  vs  the  DFT  and,  on 
average,  the  DFT  underperformed  the  random  matrices  in  decorrelating  nonstationary  data. 

However,  there  is  a  fringe  benefit  here.  The  empirical  observation  that  the  DFT  is  so  good  (resp., 
poor)  a  decorrelator  of  stationary  (resp..  nonstationary)  data  suggests  its  usage  as  a  test  for  stationarity. 
That  is.  given  a  stretch  of  data  we  compute  its  DFT  and  then  the  measure  of  decorrelation  relative  to 
the  original  data.  A  significant  level  of  this  Figure,  say  >20  percent,  would  be  taken  to  imply  station¬ 
arity  —  otherwise,  nonstationarity.  Our  simulation  suggests  such  a  test  would  be  100-percent  accurate 
for  data  dimension  >  8,  but  a  more  vigorous  justification  should  still  be  given.  The  problem  lies  in 
passing  from  the  data  stretch,  or  its  transform,  to  a  covariance  matrix.  The  associated  sampling  errors 
will  act  to  reduce  the  test  accuracy. 

Finally,  we  turn  to  the  matter  of  Wiener  filtering.  Hence,  we  display  results  only  in  dimension  32 
and  for  SNR  =  +5  dB.  Recalling  from  the  preceding  section  that  larger  values  of  the  filtering  perfor¬ 
mance  measure  are  worse  than  smaller  values,  we  see  from  Figure  8  that,  once  again,  the  cyclic  group 
is  better  for  stationary  data  than  the  dyadic  group,  which  in  turn  is  slightly  better  than  the  Russian 
groups.  The  average  increase  in  mean-square  error  for  these  optimal  group  filters  over  the  Wiener-filter 
error  is  3.5.  7.0.  and  8.3  percent,  respectively. 
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Figure  <S’.  Group  filter  error  increase  over  optima!  I percent )  —  stationary.  SNR  =  5  dti 


For  nonstationary  data,  we  observe  from  Figures  9  and  10  a  definite,  although  small,  advantage 
to  the  dyadic  group  optimal  transform,  and  a  virtual  equivalence  of  performance  between  this  transform 
and  the  Russian  groups  transform. 
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Figure  9.  Group  filter  error  increase  over  optimal  (percent  )  —  nonstationary.  SNR  =  5  (IB 
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DYADIC  GROUP  OF  ORDER  32 


Figure  10.  Group  filter  error  increase  over  optimal  (percent  I  —  nonstarionary.  SNR  =  5  dB 


Based  on  these  and  related  figures  for  other  dimensions  and  SNRs.  we  can  draw  the  following 
conclusion.  For  filtering  stationary  data,  the  use  of  ordinary  digital  filters  (with  computation  perhaps 
accelerated  by  means  of  the  associated  FFT)  is  recommended.  For  nonstationary  data,  the  use  of  the 
group  filters  based  on  the  Russian  groups  is  recommended  because  of  the  slightly  lower  error  rate  and 
the  decrease  in  complexity  of  the  associated  group  transform. 

At  this  time,  two  other  remarks  are  appropriate.  The  first  concerns  the  dependence  of  the  foregoing 
conclusions  on  the  SNR  level.  As  SNR  increases,  all  the  group  filtering  errors  tend  to  collapse  back  to 
the  optimal,  which  in  turn  is  decreasing  to  zero  (as  it  must,  according  to  Equations  (3.4)  and  (3.7)1.  As 
SNR  decreases,  the  Wiener  filter  W  tends  to  0  from  Equation  (3.4)  (more  generally,  the  resolvent 
operator  vanishes  at  *),  and  the  mean-square  filtering  error  tends  to  N  =  nil  ),  according  to  Equa¬ 
tion  (3.3).  In  this  situation,  while  the  absolute  error  increases  to  N,  as  just  noted,  the  relative  group-filter 
error  appears  to  decrease  to  0.  It  should  be  checked  here  whether  the  mathematics  of  optimal  group 
filtering  necessarily  forces  the  optimal  group  filter  to  the  zero-operator.  We  have  also  noticed  a  slightly 
larger  degree  of  scatter  in  the  scatter  plots  as  SNR  decreases,  so  as  to  favor  the  use  of  the  dyadic/Russian 
groups  filters. 

The  second  remark  pertains  to  the  dependence  of  our  conclusions  on  the  data  dimension  N.  The 
basic  observation  here  is  that,  with  increasing  N.  all  the  point  clouds  in  the  various  scatter  plots  tend  to 
decrease  in  size,  both  visually  and  as  measured  by  the  magnitude  of  the  determinant  of  the  2  x  2  sample 
covariance  matrix  of  the  point  cloud  (sometimes  called  the  generalized  sample  variance).  This  indicates 
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that  the  effect  of  the  various  transforms/filters  is  becoming  constant  across  the  various  signal  classes.  In 
the  filtering  studies  we  also  noted  that  the  contracting  point  cloud  tended  to  cluster  along  the  45°  line 
as  N  increased,  suggesting  an  asymptotic  equality  of  performance.  Insofar  as  this  is  truly  valid,  it 
suggests  that  the  main  criterion  for  choice  of  group,  at  least  for  filtering  applications,  is  the  complexity 
of  its  group  transform.  This  observation  thereby  tends  to  validate  the  emphasis  given  in  the  recent 
mathematical  literature  to  transform  complexity  rather  than  error  analysis.  In  particular,  for  the  small 
collection  of  groups  considered  in  this  work,  it  suggests  the  use  of  the  Russian  groups  filters  for  Wiener 
filtering  of  large  dimensional  nonstationary  data. 

The  alert  reader  may  have  noted  the  absence  of  any  mention  of  results  pertaining  to  the  dihedral 
groups,  even  though  this  was  one  of  the  four  classes  of  groups  originally  discussed  in  Section  2.  Because 
it  became  apparent  rather  early  in  our  work  that  the  error  rates  for  these  groups  agreed  very  closely  w  ith 
those  of  the  cyclic  groups,  there  was  really  not  much  point  in  our  suggesting  their  use  in  place  of  the 
so  much  more  familiar  cyclic  group  transforms  (DFT)  and  digital  filters. 


7.  SUMMARY 


The  progress  of  this  research  project  to  date  can  perhaps  be  summarized  best  along  the  lines  of 

•  Where  we  were 

•  Where  we  are 

•  Where  we  may  be. 

Prior  to  the  inception  of  this  work,  a  highly  developed  theory  existed  of  (weakly)  stationary  time 
series,  indexed  by  either  the  group  of  integers  or  the  group  of  all  real  numbers,  with  most  of  the  basic 
constructs  carrying  over  the  case  where  the  index  set  is  a  general  locally  compact  abelian  group.  Here, 
"basic  constructs"  include  Fourier  transform,  power  spectrum,  autocorrelation  function,  convolution 
operators  (linear  invariant  filters),  etc.  For  applications,  there  was  the  discrete  Fourier  transform  (DFT) 
on  the  cyclic  group  of  arbitrary  order,  and  various  fast  algorithms  (FFT)  for  its  computation,  along  with 
some  asymptotic  results  describing  its  effect  of  long  segments  on  a  stationary  series.  Given  this  theory 
and  the  efficient  computational  procedures,  there  would  have  seemed  to  be  no  particular  reason  to  expect 
to  discover  any  significantly  better  methods  with  other  groups  and.  indeed,  our  simulations  have  sup¬ 
ported  this  belief.  As  classical  transforms  well  suited  to  the  treatment  of  stationary  data,  we  are 
including  the  discrete  cosine  transform  (DCT)  here  along  with  the  DFT. 

Based  on  the  work  and  computer  experiments  reported  here,  we  can  announce  a  small  step  beyond 
the  state  of  knowledge  just  described.  Namely,  when  dealing  with  general  nonstationary  data,  other 
group  transforms  and  Filters  exhibit  both  better  error  rates  and  lower  complexity.  These  are  based  on 
both  abelian  (e.g.,  dyadic)  and  nonabelian  (e.g.,  Russian)  groups.  Three  further  remarks  are  pertinent 
here.  First,  while  the  decrease  in  error  rates  is  not  high,  we  should  bear  in  mind  that  the  results  are  based 
on  truly  random  and  unstructured  data,  so  there  possibly  might  be  a  more  significant  improvement  for 
data  derived  from  a  particular  nonstationary  model.  Second,  the  alternate  transforms  are  just  as  easy  to 
use  as  the  DFT.  Third,  there  is  a  general  methodology,  based  in  part  on  the  use  of  random  correlation 
matrices,  to  permit  comparison  of  arbitrary  transforms  and  filters.  The  development  of  both  the  theory 
and  the  computer  generation  of  these  matrices  has  been  an  important  by-product  of  this  project. 

Finally,  several  open  questions  of  general  nature  remain  which  we  want  to  record  once  more. 
First,  there  is  a  case  to  be  made  that  of  the  two  familiar  abelian  group  transforms  studied  here,  namely 
the  DFT  and  W-HT,  the  latter  is  somehow  the  one  of  choice  —  if  a  choice  must  be  made.  It  has  by 
far  the  simpler  arithmetic,  exhibits  almost  comparable  performance  on  stationary  data,  and  better  per¬ 
formance  across  the  broader  range  of  nonstationary  data.  Is  there  a  theoretical  basis  for  this  situation? 
Second,  there  is  the  circle  of  questions  raised  near  the  end  of  Section  2  concerning  the  existence  of  other 
"superfast"  (and  presumably  nonabelian)  groups  of  order  2".  besides  the  Russian  groups,  and  their 
relevance  for  signal-processing  applications.  What  are  the  limits  of  such  group-based  performance  and 
complexity?  And,  as  a  Final  speculation,  we  can  wonder  whether  the  theory  of  group  transforms  and 
filters  is  in  any  reasonable  sense  sufficient  for  the  needs  of  signal  processing.  That  is,  we  have  made 
much  of  the  fact  that  it  offers  a  unified  approach  to  fast  transforms  and  filters,  and  we  conclude  by 
wondering:  Are  these  the  only  such  transforms  and  filters  that  need  be  considered? 
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ADDENDUM 


During  the  preparation  of  this  report,  two  other  articles  appeared  that  bear  on  the  "Russian  groups" 
defined  in  Section  2.  These  articles,  by  Clausen  [29)  and  Rockmore  |30|.  both  exhibit  a  large  class  of 
nonabelian  groups  G  for  which  the  group  transform  F  and  its  inverse  can  both  be  computed  in 
OOGI  log  IGI)  operations,  where  IGI  :=  ord(G).  The  groups  to  which  this  result  applies  are  the  so-called 
metabelian  groups.  Recall  that  G  is  metabelian  if  and  only  if  G  has  normal  abelian  subgroup  H  for  which 
the  quotient  group  GIH  is  also  abelian.  Equivalently,  the  commutator  subgroup  of  G  is  abelian. 

The  result  of  Clausen  that  is  particularly  relevant  here  is  that  any  metabelian  2-group  has  a  group 
(Fourier)  transform  F  such  that  both  F  and  its  inverse  can  be  computed  in  at  most  1.5  IGI  log  IGI 
operations.  The  term  "operations"  means,  in  the  present  context,  additions/subtractions  and  (scalar) 
multiplications.  We  use  the  more  neutral  symbol  F  rather  than  F(.  to  emphasize  the  non-uniqueness  of 
the  group  transform  for  nonabelian  groups,  due  to  the  presence  of  at  least  one  irreducible  representation 
of  degree  >  2.  Now.  by  the  remarks  made  about  the  Russian  groups  sequence  \BG  }  in  Section  2.  they 
are  metabelian  2-groups  so  this  general  theory  applies.  However,  for  those  groups  the  constant  1.5  can 
be  halved,  as  n  — >  that  was  the  whole  point  of  [ 27 1 .  So  we  now  have  a  refinement  of  our  earlier 
question  about  superfast  groups:  namely,  it  would  seem  sensible  to  restrict  our  attention  to  the  subclass 
of  metabelian  groups,  and  to  attempt  a  complete  understanding  of  its  potentialities  before  considering 
any  further  generality. 


REFERENCES 


1.  R.B.  Holmes,  "Mathematical  foundations  of  signal  processing  II.  The  role  of  group  theory." 
MIT  Lincoln  Laboratory,  Lexington,  Mass.,  Technical  Rep.  781  (13  October  1987).  DT1C  AD- 
A 188482. 

2.  T.S.  Durrani  ct  al.  (eds.).  Mathematics  in  Signal  Processing.  New  York:  Oxford  University  Press 
(1987). 

3.  R.M.  Young,  An  Introduction  to  Nonharmonic  Fourier  Series.  New  York:  Academic  Press 
(1980). 

4.  S.  van  Eijndhoven  and  J.  de  Graaf,  A  Mathematical  Introduction  to  Dirac's  Formalism.  New 
York:  North-Holland.  Elsevier  (1986). 

5.  I.  Daubechies.  A.  Grossmann.  and  Y.  Meyer,  "Painless  nonorthogonal  expansions,"  ./.  Math 
Phys.  27.  1271  (1986). 

6.  A.  Grossmann,  J.  Morlet,  and  T.  Paul,  "Transforms  associated  to  square  integrable  group  rep¬ 
resentations  I.  General  results,"  J.  Math.  Phys.  256.  2473  (1985). 

7.  J.  von  Neumann.  Mathematical  Foundations  of  Quantum  Mechanics.  Princeton.  NJ:  Princeton 
University  Press  (1955). 

8.  D.  Gorenstein,  "The  enormous  theorem,”  Sci.  Am.  253.  104  (1985). 

9.  J.J.  Rotman,  The  Theory  of  Groups.  Boston,  Mass.:  Allyn  and  Bacon.  Inc.  (1965). 

10.  R.B.  Holmes.  “On  random  correlation  matrices,"  MIT  Lincoln  Laboratory.  Lexington.  Mass.. 
Technical  Rep.  803  (28  October  1988).  DTIC  AD-A202786.  To  appear  in  SIAM  J.  Matrix  Anal. 
Appl. 

11.  _ .  “On  random  correlation  matrices  II.  The  Toeplitz  case,"  MIT  Lincoln  Labora¬ 

tory,  Lexington,  Mass.,  Technical  Rep.  816  (23  March  1989),  DTIC  AD-A208229.  To  appear 
in  Commun.  Statist. 

12.  P.  Divijver  and  J.  Kittler,  Pattern  Recognition ,  Englewood  Cliffs,  NJ:  Prentice  Hall  (1982). 

13.  A. I.  Solodovnikov,  “Formation  of  optimal  Karhunen-Loeve  bases  by  means  of  a  fast  transform 
algorithm,"  Voprosy  Teor.  Sistem  Avtomat.  Upravlenija  5,  10  (1980). 

14.  I.E.  Kaporin,  “A  fast  algorithm  for  a  class  of  unitary  transformations,"  Sow  Math.  Engl  Transl 
28.  316  (1983). 

15.  M.  Hamidi  and  J.  Pearl.  "On  the  residual  correlation  of  finite-dimensional  discrete  Fourier  trans¬ 
forms  of  stationary  signals,"  IEEE  Trans.  Inf.  Theory  IT-21,  480  (1975). 

16.  J.  Pearl.  "On  coding  and  filtering  stationary  signals  by  discrete  Fourier  transforms."  IEEE  Trans. 
Inf.  Theory  IT- 19,  229  (1973). 


37 


17.  M.  Unser.  "On  the  approximation  of  the  discrete  Karhunen-Loeve  transform  for  stationary 
processes."  Signal  Process.  7.  231  (1984). 

18.  R.M.  Gray.  "On  the  asymptotic  eigenvalue  distribution  of  Toeplitz  matrices."  IEEE  Trans  Inf. 
Theory  IT- 18.  725  (1972). 

19.  M.  Karpovsky  and  E.  Trachtenberg.  "Some  optimization  problems  for  convolution  systems  over 
finite  groups."  Inf.  Control  34.  227  (1977). 

20.  _ .  "Filtering  in  a  communication  channel  by  Fourier  trans¬ 

forms  over  finite  groups."  in  Spectral  Techniques  and  Fault  Detection.  M.  Karpovsky  (ed.h 
New  York:  Academic  Press  (1985).  pp.  179-216. 

21.  E.A.  Trachtenberg,  "Construction  of  group  transforms  subject  to  several  performance  criteria." 
IEEE  Trans.  Acoust.  Speech  Signal  Process.  ASSP-33.  1521  (1985). 

22.  _ .  "A  remarkable  discrete  unitary  transform,"  pp.  637-650  in  Reference  2. 

23.  M.G.  Karpovsky.  "Fast  Fourier  transforms  on  finite  non-abelian  groups."  IEEE  Trans.  C output. 
C-26,  1028  (1977). 

24.  M.D.  Atkinson.  "The  complexity  of  group  algebra  computations."  Then.  Contput.  Sci.  5.  205 
(1977). 

25.  T.  Beth.  "On  the  computational  complexity  of  the  general  discrete  Fourier  transform."  Theo. 
Comput.  Sci.  51.  331  (1987). 

26.  P.  Diaconis  and  D.  Rockmore,  “Efficient  computation  of  the  Fourier  transform  on  finite  groups," 
Stanford  University  Dept,  of  Statistics,  Stanford.  Calif..  Technical  Rep.  292  (1988). 

27.  S.  Berman  and  F  Grushko,  “The  theory  of  discrete  signal  processing."  Prohl.  Inf.  Transnt.  19. 
284  (1984). 

28.  J.  Huang  and  P.  Schuitheiss.  “Block  quantization  of  correlated  Gaussian  random  variables." 
IEEE  Trans.  Circuits  Syst.  CS-11.  289  (1963). 

29.  M.  Clausen.  “Fast  Fourier  transforms  for  metabelian  groups."  SIAM  J.  Comp.  18,  584  (1989). 

30.  D.  Rockmore.  “Fast  Fourier  analysis  for  abelian  group  extensions.”  to  appear  in  Adv.  Appl.  Mailt. 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704  0188 


Public  r  sport  mg  burdsn  for  this  collection  of  information  is  estimated  to  average  t  hour  per  response,  including  the  time  tor  reviewing  instructions.  Seerchit'0  existing  data  sources,  gathering  and  maintaining  the 
data  needed,  and  completing  and  reviewing  the  collection  of  information  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this 
burden  to  Washington  Headquarters  Services.  Directorate  for  information  Operations  and  Reports.  1215  Jefferson  Davis  Highway.  Suita  1204.  Arlington.  VA  22202-4302.  and  to  the  Office  of  Management  and 
Budget  Paperwork  Reduction  Project  (0704  0188).  Washington.  DC  20503 


1 .  AGENCY  USE  ONLY  (Leave  blank) 


TITLE  AND  SUBTITLE 


Signal  Processing  on  Finite  Groups 


6  AUTHOR(S) 


Richard  B.  Holmes 


27  February  1990 


3  REPORT  TYPE  AND  DATES  COVERED 
Technical  Eeport 


5.  FUNDING  NUMBERS 


C  —  F 1 9628-90-C-0002 
PE  —  63304 A,  65301  A,  63308 A 
PR  —  34 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  AODRESS(ES) 

Lincoln  Laboratory.  MIT 
P.O.  Box  73 

Lexington,  MA  02173-9108 


8  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


Technical  Report  873 


9  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 
L.S.  Army  Strategic  Defense  Command  —  Huntsville 
Sensor  Directorate 
P.O  Box  1500 
Huntsville,  AL  35807-3801 


10  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


F.SD-TR-89-267 


12a  OISTRIBUTION/AVAILABILITY  STATEMENT 


12b  DISTRIBUTION  CODE 


Approved  for  public  release;  distribution  is  unlimited. 


13  ABSTRACT  ( Maximum  200  words) 


A  unified  approach  to  the  design  and  evaluation  of  fast  algorithms  for  discrete  signal  processing  is 
developed.  Based  on  the  theory  of  finite  groups,  it  hence  includes  the  familiar  cases  of  the  fast  Fourier  and 
W alsh-Hadamard  transforms.  However,  the  use  of  noncommutative  groups  reveals  a  large  variety  of  novel 
methods.  Some  of  these  exhibit  a  superior  performance,  as  measured  by  both  reduced  error  rates  and 
computational  complexity,  on  nonstationary  data. 

The  recent  historv  of  this  subject  is  reviewed  first,  followed  by  a  detailed  examination  of  the  three 
principal  ingredients  of  the  present  study:  the  underlying  groups,  the  signal-processing  tasks  on  which  the 
group-based  algorithms  are  to  compete,  and  the  signal  models  used  to  define  the  data  environment.  Test 
results  and  conclusions  then  follow,  the  former  being  based  on  the  use  of  random  correlation  matrices. 


14  SUBJECT  TERMS 
signal  processing  gro 

finite  groups  fast 

group  transform 


17  SECURITY  CLASSIFICATION 
OF  REPORT 

Unclassified 


group  filter 

fast  Fourier  transform 


Walsh-Hadamard  transform 
random  correlation  matrix 


18  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 
Unclassified 


19  SECURITY  CLASSIFICATION 
OF  ABSTRACT 

Unclassified 


15.  NUMBER  OF  PAGES 


16  PRICE  CODE 


20  LIMITATION  OF 
ABSTRACT 


NSN  7540-01-280-5500 


Standard  Form  298  (Rev.  2-89) 
Prescribed  by  AMSI  Std  239-18 

298  102 


